-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable S3 compliant data vaults using https and http #8167
Conversation
WalkthroughThe pull request introduces several enhancements and fixes across the WEBKNOSSOS application, including new features for metadata handling, improved search functionalities, and enhanced support for S3-compliant object storage. Notable changes include updates to the Changes
Assessment against linked issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@@ -123,7 +123,7 @@ | |||
"ts-coverage": "typescript-coverage-report", | |||
"find-cyclic-dependencies": "yarn run dpdm -T --tree false --warning false --extensions .ts,.tsx frontend/javascripts/main.tsx", | |||
"check-cyclic-dependencies": "node ./tools/check-cyclic-dependencies.js", | |||
"startf": "yarn rm-fossil-lock; yarn kill-listeners; yarn start", | |||
"startf": "yarn rm-fossil-lock; yarn kill-listeners; rm -r webknossos-jni/target; yarn start", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to commit this? Seems like some that is covered by clean.sh
already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because this is always annoying me because the webknossos-jni folder causes an error after using docker compose up (e.g. e2e tests). startf is "start forcefully", and this directory in particular is sometimes leading to me not being able to start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (8)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
50-50
: Document protocol detection behavior.Given that this change enables automatic protocol detection for S3 vaults, consider adding a code comment explaining this behavior to help future maintainers understand the purpose of passing
WSClient
.+ // WSClient is used to detect if the S3 vault supports HTTPS by sending a test request + // before falling back to HTTP if necessary S3DataVault.create(remoteSource, ws)
Line range hint
41-64
: Consider refactoring scheme handling for better maintainability.The current implementation uses string comparisons for scheme detection. Consider extracting these into a sealed trait/enum for better type safety and maintainability.
Example refactor:
sealed trait VaultScheme { def value: String } object VaultScheme { case object S3 extends VaultScheme { val value = "s3" } case object HTTPS extends VaultScheme { val value = "https" } case object HTTP extends VaultScheme { val value = "http" } case object GS extends VaultScheme { val value = "gs" } case object File extends VaultScheme { val value = "file" } def fromString(scheme: String): Option[VaultScheme] = values.find(_.value == scheme) def values: Seq[VaultScheme] = Seq(S3, HTTPS, HTTP, GS, File) }package.json (1)
126-126
: Consider adding error handling to the startup script.The new
startf
script combines cleanup and startup steps, which is helpful for development workflow. However, consider making it more robust:- "startf": "yarn rm-fossil-lock; yarn kill-listeners; rm -r webknossos-jni/target; yarn start", + "startf": "yarn rm-fossil-lock; yarn kill-listeners; rm -rf webknossos-jni/target 2>/dev/null || true; yarn start",The changes:
- Added
-f
flag torm
to prevent errors if files don't exist- Added
2>/dev/null || true
to suppress errors and continue executiontest/backend/DataVaultTestSuite.scala (3)
141-149
: Enhance data integrity verification in brotli-compression tests.The current test only verifies the first 10 bytes of the decompressed data. Consider:
- Verifying the complete data integrity
- Adding test cases for different compression types
- Testing large file handling
- Adding partial read tests
180-203
: Enhance directory listing test coverage.Current tests could be more comprehensive. Consider adding test cases for:
- Empty directories
- Directories with exactly maxItems entries
- Pagination scenarios
- Special characters in paths
- Nested directory structures
Also, consider parameterizing the tests with different maxItems values instead of hardcoding 3:
Seq(0, 1, 5, 10).foreach { maxItems => s"list available directories with maxItems=$maxItems" in { // Test implementation } }
Based on the code analysis, I can now generate the final response:
Test suite needs focus on HTTPS fallback mechanism and data vault interactions
The current test suite focuses on basic data vault operations but misses critical scenarios:
- The HTTPS fallback mechanism is not explicitly tested, despite being a core feature of
HttpsDataVault
- No tests verify the automatic fallback to HTTPS when other protocols fail
- Missing edge cases for range requests and protocol switching
Suggested improvements:
- Add test cases for protocol fallback scenarios in
DataVaultTestSuite.scala
- Include tests for
supportsRangeRequests
behavior inHttpsDataVault
- Test error handling during protocol transitions
🔗 Analysis chain
Line range hint
1-277
: Consider restructuring tests for better organization and coverage.While the test suite provides good basic coverage, consider:
- Creating a dedicated test section for HTTPS fallback mechanism (PR's main objective)
- Using shared test fixtures to reduce duplication
- Organizing tests by feature rather than by request type
- Adding integration tests with Hetzner Object Storage (mentioned in PR objectives)
Let's verify the test coverage:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if there are any direct references to Hetzner storage in tests rg -i "hetzner" --type scalaLength of output: 28
Script:
#!/bin/bash # Let's check for any S3 or HTTPS fallback related code in the codebase rg -A 5 "HttpsDataVault" --type scala # Also check for any other test files related to data vaults fd ".*Test.*\.scala$" | grep -i "vault"Length of output: 5446
Script:
#!/bin/bash # Let's check the implementation of HttpsDataVault to understand the fallback mechanism ast-grep --pattern 'class HttpsDataVault $_' -A 30 # Also check for any S3 fallback related code rg "fallback" --type scalaLength of output: 21184
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (2)
Line range hint
131-138
: Potential Unhandled Exceptions ingetObjectSummaries
While fetching object summaries, exceptions other than
NoSuchBucketException
may not be adequately handled, potentially causing the application to crash on unexpected errors.Implement comprehensive exception handling to manage all possible exceptions from
listObjectsV2
:objectListing <- notFoundToFailure(client.listObjectsV2(listObjectsRequest).asScala).recoverWith { case exception: SomeSpecificExceptionType => // Handle specific exception Fox.failure(exception) }
232-233
: Sequential Dependency on Protocol DeterminationThe construction of the S3 client depends on the result of
determineProtocol
, which might introduce latency due to the network call. This could be problematic in high-throughput scenarios.Consider decoupling client creation from protocol determination or caching the result to minimize repeated network calls.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
CHANGELOG.unreleased.md
(1 hunks)package.json
(1 hunks)test/backend/DataVaultTestSuite.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
(7 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala
(1 hunks)
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
Line range hint 41-64
: Implementation aligns well with PR objectives.
The changes effectively address the Hetzner Object Storage compatibility issue by:
- Adding protocol detection capability through WSClient
- Maintaining consistent error handling
- Preserving backward compatibility with other vault types
Line range hint 1-29
: Verify WSClient lifecycle management.
The injected WSClient
is used across multiple vault types. Ensure proper cleanup of WSClient resources when the service is terminated.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (2)
227-239
: AWS Region Parsing May Fail for Custom Endpoints
The call to AwsHostNameUtils.parseSigningRegion
may return None
for custom or non-standard endpoints, leading to a default region of US_EAST_1
. This might cause authentication failures with certain S3-compatible services.
Verify that the region is correctly determined for all intended endpoints. If necessary, allow configuration of the region or implement a fallback mechanism.
227-239
: Ensure Thread Safety in Asynchronous Client Creation
The getAmazonS3Client
method assembles the S3 client asynchronously. If multiple threads access clientFox
, there might be potential race conditions during client initialization.
Please verify that clientFox
is thread-safe and properly handles concurrent access. Run the following script to check for any shared mutable state that could lead to race conditions:
@@ -38,6 +38,7 @@ For upgrade instructions, please check the [migration guide](MIGRATIONS.released | |||
- Users without edit permissions to a dataset can no longer delete sharing tokens via the API. [#8083](https://github.com/scalableminds/webknossos/issues/8083) | |||
- Fixed downloading task annotations of teams you are not in, when accessing directly via URI. [#8155](https://github.com/scalableminds/webknossos/pull/8155) | |||
- Deleting a bounding box is now possible independently of a visible segmentation layer. [#8164](https://github.com/scalableminds/webknossos/pull/8164) | |||
- S3-compliant object storages can now be accessed via HTTPS. [#8167](https://github.com/scalableminds/webknossos/pull/8167) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance and relocate the changelog entry.
The changelog entry should be moved to the "Added" section since it represents a new feature rather than a fix. Additionally, the description should be expanded to better explain the functionality.
Apply this diff to improve the changelog entry:
- - S3-compliant object storages can now be accessed via HTTPS. [#8167](https://github.com/scalableminds/webknossos/pull/8167)
+ ### Added
+ - Added automatic protocol detection for S3-compliant object storages, allowing seamless access via both HTTPS and HTTP. The system now automatically determines the supported protocol by first attempting HTTPS and falling back to HTTP if necessary. This improves compatibility with various object storage providers, including Hetzner Object Storage. [#8167](https://github.com/scalableminds/webknossos/pull/8167)
Committable suggestion skipped: line range outside the PR's diff.
WsTestClient.withClient { ws => | ||
val vaultPath = | ||
new VaultPath(uri, S3DataVault.create(RemoteSourceDescriptor(uri, None), ws)(globalExecutionContext)) | ||
val bytes = | ||
(vaultPath / "s0/5/5/5").readBytes(Some(range))(globalExecutionContext).get(handleFoxJustification) | ||
assert(bytes.length == range.length) | ||
assert(bytes.take(10).sameElements(Array(0, 0, 0, 3, 0, 0, 0, 64, 0, 0))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add test coverage for HTTPS fallback mechanism and edge cases.
While the basic range request test is good, consider adding test cases for:
- HTTPS fallback mechanism (key PR objective)
- Invalid range requests (e.g., overlapping ranges, out-of-bounds)
- Edge cases (e.g., range boundaries, empty ranges)
Example test structure:
"when vault supports HTTPS" should {
"fallback to HTTPS successfully" in {
WsTestClient.withClient { ws =>
val uri = new URI("http://https-supporting-bucket.example.com")
val vaultPath = new VaultPath(uri, S3DataVault.create(RemoteSourceDescriptor(uri, None), ws))
// Test HTTPS fallback
}
}
}
WsTestClient.withClient { ws => | ||
val s3DataVault = S3DataVault.create(RemoteSourceDescriptor(uri, None), ws)(globalExecutionContext) | ||
val vaultPath = new VaultPath(uri, s3DataVault) | ||
val result = vaultPath.readBytes()(globalExecutionContext).await(handleFoxJustification) | ||
assertBoxEmpty(result) | ||
} | ||
} | ||
} | ||
|
||
"return empty box" when { | ||
"requesting a non-existent object in existent bucket" in { | ||
val uri = new URI(s"s3://open-neurodata/non-existent-object${UUID.randomUUID}") | ||
val s3DataVault = S3DataVault.create(RemoteSourceDescriptor(uri, None)) | ||
val vaultPath = new VaultPath(uri, s3DataVault) | ||
val result = vaultPath.readBytes()(globalExecutionContext).await(handleFoxJustification) | ||
assertBoxEmpty(result) | ||
WsTestClient.withClient { ws => | ||
val s3DataVault = S3DataVault.create(RemoteSourceDescriptor(uri, None), ws)(globalExecutionContext) | ||
val vaultPath = new VaultPath(uri, s3DataVault) | ||
val result = vaultPath.readBytes()(globalExecutionContext).await(handleFoxJustification) | ||
assertBoxEmpty(result) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Expand error handling test coverage.
While basic error cases are covered, consider adding tests for:
- Network errors (connection refused, timeout)
- Permission denied scenarios
- Rate limiting responses
- Timeout handling
Example test:
"handle network errors gracefully" in {
WsTestClient.withClient { ws =>
// Mock WS client to simulate network errors
val mockWs = // ... setup mock
val uri = new URI("s3://example-bucket")
val s3DataVault = S3DataVault.create(RemoteSourceDescriptor(uri, None), mockWs)
val vaultPath = new VaultPath(uri, s3DataVault)
val result = vaultPath.readBytes()(globalExecutionContext).await(handleFoxJustification)
assertBoxFailure(result)
}
}
class S3DataVault(s3AccessKeyCredential: Option[S3AccessKeyCredential], | ||
uri: URI, | ||
ws: WSClient, | ||
implicit val ec: ExecutionContext) | ||
extends DataVault { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential Performance Impact During Initialization
The constructor now includes a network call to determine the protocol using WSClient
. This synchronous call during initialization can introduce latency and block the thread, potentially affecting application startup time and scalability.
Consider deferring the protocol determination until it is actually needed or making the network call asynchronously to avoid blocking the main execution thread.
private def determineProtocol(uri: URI, ws: WSClient)(implicit ec: ExecutionContext): Fox[String] = { | ||
// If the endpoint supports HTTPS, use it. Otherwise, use HTTP. | ||
val httpsUri = new URI("https", uri.getAuthority, "", "", "") | ||
val httpsFuture = ws.url(httpsUri.toString).get() | ||
|
||
val protocolFuture = httpsFuture.transformWith({ | ||
case TrySuccess(_) => Future.successful("https") | ||
case TryFailure(_) => Future.successful("http") | ||
}) | ||
for { | ||
protocol <- protocolFuture.toFox | ||
} yield protocol | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Robustness of Protocol Determination Logic
The determineProtocol
method relies on attempting an HTTPS request to infer protocol support. This approach may not be reliable due to potential transient network issues or endpoints that respond slowly, leading to false negatives.
Consider implementing a more reliable mechanism for protocol detection, such as:
- Configurable protocol preference with fallback options.
- Checking endpoint capabilities using a dedicated health-check endpoint if available.
- Introducing retries with backoff strategies to handle transient failures.
private lazy val clientFox: Fox[S3AsyncClient] = | ||
S3DataVault.getAmazonS3Client(s3AccessKeyCredential, uri, ws) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error Handling for Client Initialization Failures
The clientFox
lazy initialization wraps getAmazonS3Client
which can fail. If the initialization fails, subsequent calls will also fail without a clear indication.
Ensure that failures during client initialization are properly propagated and handled. Consider eagerly initializing the client and handling errors to prevent silent failures during runtime.
def create(remoteSourceDescriptor: RemoteSourceDescriptor, ws: WSClient)( | ||
implicit ec: ExecutionContext): S3DataVault = { | ||
val credential = remoteSourceDescriptor.credential.flatMap { | ||
case f: S3AccessKeyCredential => Some(f) | ||
case f: LegacyDataVaultCredential => Some(f.toS3AccessKey) | ||
case _ => None | ||
} | ||
new S3DataVault(credential, remoteSourceDescriptor.uri) | ||
new S3DataVault(credential, remoteSourceDescriptor.uri, ws, ec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constructor Change May Break Existing Instantiations
The create
method signature has changed to include WSClient
. This might affect other parts of the codebase that instantiate S3DataVault
without this parameter.
Ensure all instantiations of S3DataVault
across the codebase are updated accordingly. Update documentation and constructor references.
|
||
val protocolFuture = httpsFuture.transformWith({ | ||
case TrySuccess(_) => Future.successful("https") | ||
case TryFailure(_) => Future.successful("http") | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Exception Handling May Mask Underlying Issues
In the determineProtocol
method, exceptions during the HTTPS request are caught broadly, and any failure results in falling back to HTTP without logging the cause. This can mask underlying issues such as misconfigured SSL certificates or network problems.
Refine the exception handling to log the specific reasons for HTTPS failures. This can aid in debugging and ensure that genuine issues are not overlooked.
+ import play.api.Logger
...
val protocolFuture = httpsFuture.transformWith {
case TrySuccess(_) => Future.successful("https")
case TryFailure(exception) =>
+ Logger.warn(s"HTTPS request failed: ${exception.getMessage}")
Future.successful("http")
}
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
URL of deployed dev instance (used for testing):
Weird things happen when querying an object storage that supports HTTPS via HTTP. This PR first sends a get request to a datavault to check if it supports HTTPs and then uses it.
This way both HTTP and HTTPS should be supported (as sending HTTPS to a HTTP vault will cause the request to fail because of an SSL error).
This did not come up with #7453, since it was only tested on local (http) minio object storage.
Steps to test:
Issues:
(Please delete unneeded items, merge only when none are left open)
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Chores